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THE BUTYROPHILIN GENE PROMOTER AND USES THEREOF 

FIELD OF THE INVENTION 

The present invention relates generally to the butyrophiiin gene promoter. More 
specifically the present invention relates to the use of the butyrophiiin gene promoter for the 
production of heterologous proteins in the milk of transgenic animals and for the detection of 
carcinogenic substances. Applicants hereby incorporate by reference die subject matter of U.S. 
Serial No. 60/022,563. 

BACKGROUND OF THE INVENTION 

Butyrophiiin is the major integral protein associated with the fat-globule mmibrane 
(FGM) in the milk of many species and is believed to play a role in the mechanism of milk 
secretion. See Franke et aL, 7. Cell Biol 89: 485-494 (1981); Jack and Mather, /. Biol 
Chem. 265: 14481-14486 (1990); and Jack and Mather, 7. Dairy Scl 76: 3832-3850 (1993); 
each of which is herein incorporated by reference. Expressed on the apical surfaces of 
mammary epithelial cells, butyrophiiin is a type I glycoprotein, comprising a glycosylated 
exoplasmic domain, a membrane anchor approximately in the middle of the sequence, and a 
long cytoplasmic tail. 

Butyrophiiin is a member of the immunoglobulin superfamily (IgSF) (Gardinier et al, 
J. Neuroscl Res. 33, 177-187 (1992)), with closest structural homology in the exoplasmic 
domain to the B7. 1 (CD 80) and B7.2 (CD 86) receptors (Linsley et ai , Protein Scl 3: 1341- 
1343 (1994)). Halhnarks of these proteins are two exoplasmic immunoglobulin-Iike domains; 
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one of the variable (V) or imerraediate (I) type (Williams and Barclay, Ann. Rev. Immunol 6. 
381-405 (1988); Harpaz and Chothia. /. MoL Biol. 238. 528-539 (1994)) close to the N- 
terminus; and one of the constant (C) type (Williams and Barclay, 1988) close to the membrane 
anchor. Other proteins that are homologous with butyrophilin in the exopiasmic domain 
include myelin oligodendrocyte glycoprotein (MOG), a component of the myelm sheath 
(Gardinier et al., 1992), and the chicken B-G antigens associated with the avian major 
histocompatibility complex (Miller et al., Proc. Natl. Acad. Sci. U.S.A. 88: 4377-4381 
(1991)). MOG and the B-G antigens have shorter exopiasmic domains with one V-set 
unmunoglobulia-like fold (Gardinier et al, 1992; Miller et at., 1991). The inclusion of 
butyrophilin in the IgSF and the B-G antigen system suggests that butyrophilin has immune 
functions. 

The C-teraiinal cytoplasmic domain of butyrophilin is similar to the C-teraMni of a 
group of proteins that contain zinc finger and coiled-coil domains. These proteins may bind 
nucleic acids or proteins (Bellini et ai, J. Cell Biol. 131: 563-570 (1995)) and include ret 
finger protein (RFP) (Takahashi et al., Mol. Cell BioL 8: 1853-1856 (1988)), nuclear antigen 
A of Sjogren's syndrome (SSA/Ro) (Chan et aL, J. Clin. Invest. 87: 68-76 (1991)), Xenopus 
nuclear factor 7 (XNF7) (Reddy et al.. Develop. Biol. 148: 107-116 (1991)), PwA33 from 
Pleurodeles waltl (Bellini et al., EMBO J. 12: 107-1 14 (1993)), and acid finger protein (AFP) 
(Chu et al. Genomics 29:229-239 (1995)). At the DNA level, this homologous region 
encompasses an exon, named B30.2, which was mapped together with the MOG, RFP and 
butyrophilin genes to the human MHC class I region of chromosome 6 (Vemet et al.,J. Mol. 
Evol. 37: 600-612 (1993)). Based on these observations, Vemet et al (1993) suggested that 
the butyrophilin gene evolved in the MHC by the shuffling of exons between an ancestral MOG 
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gene which gave rise to the exon encoding the J-set inmunoglobulin-like domain o 
butyrophilia, and an ancestral RFP gene, which gave rise to the B30.2 region of the 
buQ^rophilin gene. 

Butyrophilin is specifically expressed in mammary tissue, with expression being 
maximal during lactation. This mammary-specific expression of the butyrophilin gene is 
assumed to be under the control of the butyrophilin promoter. Smce butyrophilin constitutes 
a significant portion of the total protein associated with the milk FGM of many species, i.e, , 
more than 40% of the total FGM-associated protein in bovine milk is butyrophilin, the 
butyrophilin promoter is an attractive mammary-specific promoter for producing heterologous 
protein in the milk of transgenic mammals. 



Promoters of other mammary-specific genes, Le, the casein, whey acidic protein, a- 
lactalbumin, and P-lactoglobulin genes, have been used to dir ect the production of fore ign 
proteins in the milk of transge nic ammals.^ Rccent analysis of these mammary-specific gene 
promoters has led to the identification of a number of potentially unportant regulatory elements 
which mediate the lactogenic response. These elements include binding sites for the followmg: 
CTF/NFl in the P-lactoglobulin (Watson et al., NucL Acids Res. 19: 6603-6610 (1991)) and 
whey acidic proteiin genes (Li and Rosen, MoL Cell Biol 15: 2063-2070 (1995)); Oct 1 in the 
bovine a^j-casein gene (Groenen et al., Nuc. Acids Res. 20: 4311-4318 (1992)); a smgle- 
stranded nucleic acid binding protein which negatively regulates the p-casem gene (Altiok and 
Groner, MoL Cell Biol 14: 6004-6012 (1994)); Ets-related proteins which stimulate (Welte 
et al, Eur. J. Biochem. 223 : 997-1006 (1994)), and unidentified factor(s) which negatively 
regulate, the whey acidic protein gene (Kolb et al, J. Cellul Biochem. 56: 245-261 (1994)), 
and a pregnancy-specific protein which modulates progesterone-mediated repression of the 
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mouse p-casein gene (Lee and Oka, J. Biol. Chem. 267: 5797-5801 (1992)). Several genes, 
including the most intensively sttidied rodent p-casein gene promoters, contain C/EBP 
(Dopplaetal., J. Biol. Chem. 270: 17962-17969 (1995); Raughte/fl/., Molec. Endocrinol. 
9: 1223-1232 (1995)), YYI (Meier and Groner. Mol. Cell Biol. 14: 128-137 (1994); Raught 
etal., Mol. Cell Biol. 14: 1752-1763 (1994)). MGF/STAT5 (Watson Nucl. Acids Res. 
19: 6603-6610 (1991); Groenen etal.. 1992; Wakao et al.,EMBOJ. 13: 2182-2191 (1994)) 
and glucocorticoid response elements (Raught et al.. 1995). In addition, the promoter of the 
housekeeping gene, pl,4-galactosyltransferase, contains binding sites for AP-2 and CTF/NFl, 
which regulate die synthesis of a mammary-specific 3.9 kb transcript (Rajput et al., J. Biol. 
Chem. 271:5131-5142 (1996)). 

The basis for mammary-specific expression is poorly understood in any system. 
A so-called "milk-box" sequence, first identified in the proximal a-lactalbumin gene promoter 
in several species, is also conserved in many of the casein genes (Laird et al, , Biochem. J. 254: 
85-94 (1988)), and encompasses binding sites for YYI, STAT5 (Meier and Groner, 1994; 
Raught et al. , 1994) and C/EBP isoforms (Doppler et al. , 1995; Raught et a/., 1995). Also 
there are three conserved sequences in the casein genes referred to as blocks A. B, and C 
(Yoshimura, M. and Oka, T., QsD& 78. 267-275). Raught et al. (1995) have recently 
suggested that casein gene expression is regulated by composite response elements (CoREs) 
comprising STAT5 and glucocorticoid response elements and C/EBP binding sites. 

For the first time, the inventors have cloned and sequenced the mouse butyrophilin 
gene, including its promoter region and have found that the promoter sequence has no 
significant similarities with the published sequences of these other mammary-specific 
promoters. 
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Analysis of the butyrophilin promoter sequence showed that the butyrophilin promoter 
contains many f)otential regulatory elements associated with immune system genes including 
a- and v-interferon response elements, and consensus sequences for TCF-1 and PU.L (PU.l 
is a macrophage and B cell-specific transcription factor related to the ets oncogene. See 
Klemsz. et aL, Cell 61: 113-125 (1990)). In addition, the inventors have found that the 
proximal region of the butyrophilin promoter contams a repeat element of three granulocyte- 
macrophage colony-stimulating factor (GMCSF) sites which in the same context has been 
shown to regulate the mitogen-inducible expression of GMCSF in T cells (Nimer et aL , Mol. 
Cell. Biol. 10: 6084-6088 (1990), herein incorporated by reference). Thus, die butyrophilin 
promoter is also useful for the detection of carcinogenic substances, 

BRIEF DESCRIPTION OF THE INVENTION 

The present invention provides the sequence of the 5' flanking region and 
transcriptional unit of the mouse butyrophilin gene (Btn). In particular, it provides the Btn 
promoter and transcriptional regulatory elements contamed therein. 



5faingly7 an object of the invention is an isolated and purified DNA fragment 
comprising a DNA sequence encoding a polypeptide having the biological activity of a 
butyrophilin protein. 



Another object of the invention is an isolated and purified DNA fragment comprising 
a DNA sequence having the biological activity of a butyrophilin promoter. 

An additional object of the present invention is arDNA construct for expressmg a 
polypept ide in the mammary gland of a ma mmal. The rDNA construct comprises a 
butyrophUin promoter operatively linked to the DNA sequence encoding a desired polypeptide. 
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The rDNA construct may also have a DNA sequence encoding a signal sequence operatively 
linked to the DNA sequence encoding the polypeptide. Preferably, the signal sequence is a 
milk protein signal sequence. The DNA construct may also include the transcriptional unit 
and/or 3' flanking sequence of the butyrophilin gene. 

It is a further object of this invention to provide a transgenic animal which produces a 
desired polypeptide in its mammary gland. This is achieved by introducing a rDNA construct 
comprising a butyrophilin promoter operatively linked to the DNA sequence encoding the 
polypeptide into at least the mammary epithelial cells of the mammal. Alternatively, the rDNA 
construct may be introduced into a germ line of a manunal, thus subsequent generations will 
also express the desired polypeptide in their milk. 

Another aspect of the present invention is the use of the mitogen-inducible elements in 
the butyrophilin promoter to detect mitogenic properties of potential carcinogens from a variety 
of sources. For example, substances found in the environment or isolated from food sources 
could be tested for carcinogenicity. The mitogenic properties of a substance are assessed by 
detecting activation of the butyrophilin promoter in cells exposed to the substance, either by 
detection of butyrophilin mRNA or protein, or by detecting expression of a reporter gene under 
the control of a butyrophilin promoter. 

Yet another object of the invention is diagnosis of disease states such as breast cancer 
by screening mammary and nonmammary tissues of nonlactating animals for the expression 
of butyrophilin. 
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BRIEF DESCEIPnON OF THE DRAWINGS 

FIG. lA is a schematic representation of the ABtnl clone, showing the location of 
subclones prepared from XBtal which were used to generate the sequence of the mouse 
butyrophilin gene and 5' flanking region. 

FIG. IB is a schematic representation of die structure of die mouse Bm gene, showing 
the location of lbs exons and introns. 

no. IC is a schematic representation of mouse butyrophilin cDNA, showing the 
location of the cDNA subclones used to sequence mouse butyrophilin cDNA. 

FIG. 2 A-C shows die location of putative regulatory elements in the proximal 5' 
flanking region of the mouse buQrrophilin gene. 

DETAILED DESCRIPTION OF THE INVENTION 

The invention provides die sequence of die mouse butyrophilin gene, and approximately 
4.6 kb sequence of its 5' flanking region, which is also referred to as die butyrophilin 
promoter. These sequences were obtained from a clone isolated from a murine genomic library 
as desoibed below. 

Example 1: Cloniny the Mouse Butyr onhilin fiene 

Screening cf genomic library and cloning qfXBmJ. A 129 ES ceU genomic library in 
Lambda DASH* (Stratagene, La Jolla, CA) was screened widi a 2.3 kb Xhol-Xbsi fragment 
of cDNA encoding bovine butyrophilin (Jack and Madier, 1990). Plaque DNA (total of 
5(X),000 pfii) was transferred to nylon membranes (Dupont, Boston, MA), denatured in 0.5N 
NaOH, neutralized and cross-linked to the membranes by exposure to ultraviolet light using 
a UV Stratalinter* 1800 (Stratagene). Membranes were incubated for 2 h at 4TC in prewash 
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solution- (Sambrook a al. , Molecular Cloning, A Laboratory Manual, 2nd Edition, Cold Spring 
Harbor Laboratory Press, NY, (1989)), foUowed by 6 h at 42' C in pre-hybridization solution, 
and were then incubated overnight at 42*'C in hybridization solution with tiie bovine 
butyrophilin cDNA fragment which had been labelled with [a-'¥]-dCTP to a spedfv: activity 
of 10* cpin//ig by die random priming method (Feinberg and Vogelstein, Anal. Biochem. 132: 
6-13 (1983)). Filters were briefly rinsed with 2X SSC (Sambrook et al.. 1989) and then 
washed three times at 55° C with 2X SSC containing 0. 1 % (w/v) SDS, for 20 min each time, 
and the cDNA bound to the membranes detected by exposure to X-ray filn) , overnight, at -80° 
C. One potentially positive plaque was detected from a total of 500,000 pfu's screened and 
tills cloned DNA was designated A,Btnl, which has been deposited witii tiie American Type 
Culture Collection as ATCC designation 97513. 

To confirm tiiat AJBtol contained tiie mouse bulyrophilin gene, samples of tiie cloned 
DNA (ABtnl), mouse, and bovine genomic DNA were digested with eitiier EcdRl or HindSl 
and subjected to Soufliem analysis using tiie PP]-labelled 2.3kb XbdL fragment of heterologous 
bovine cDNA as a probe.. In each case, digestion wifli flie restiiction oidonucleases generated 
a characteristic pattern of DNA fragments which hybridized to tiie f^P]-labelled probe. 
Smiilar patterns of radiolabelled bands were detected in tiw genomic DNA and ABml samples 
(data not shown). As flie sequence of ABtal became available, a mouse cDNA probe, 
mcDNA3, encoding tiie 3' end of exon 3 tiirough tiie first 396 bp of exon 7 (see FIG. IC) was 
prepared by RT-PCR. A Soutiiem blot, similar to tiiose described above and probed wifli tiiis 
homologous cDNA, confirmed the identity of ABml (data not shown). 
Example 2: Sequencing the Mouse Rnfvrnphilin r>ni» 
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Subclones spanning over 14 kb of ABtnl DNA were prepared (see HG. lA) and 
sequenced on both strands using tb&ftnol sequencing kit from Promega Corp (Madison, WI). 
Autoradiographs were scanned with a Molecular Dynamics Computing Densitometer and the 
sequences read using the Image Quant*. Version 3.30 software package (Molecular Dynamics, 
Sunnyvale, CA). The computer program MACAW (Schuler et aL, 1991) was used to compile 
the full-length sequence from the sequencing gels and the sequence is shown in SEQ ID N0:1. 
The entire Bm sequence has been deposited in the GenBank Data Base under Accession No. 
U67065. The butyrophilin promoter is contained within the first 4,693 nucleotides of SEQ ID 
N0:1. The proximal part (first 1750 nucleotides) of this region is shown in SEQ ID NO: 2 
and schematically illustrated in HG. 2. with the nucleotides bemg renumbered in conventional 
format, i.e., where the most proximal transcriptional start site (see below) is designated +1. 
Example 3: Expression of the Mouse Butyrophilin fiene 

Mapping the 5 'end of mouse butyrophilin mRNA. The transcriptional start sites were 
identified by primer extension analysis using a ^^P-labelled pruner having the sequence, 5'- 
GGGCTCTGTATTTCCCCTAC.3' (SEQ ID N0:3), and total RNA from day 14 lactating 
mammary gland. This primer extension assay was adapted from Roussel et ai (DNA Cell Biol. 
14: 777-788 (1995)), which is herein incorporated by reference. Three major labelled products 
were obtained from this primer extension experiment, suggesting that transcription of Btn is 
initiated from at least three sites, at nucleotides -83, -19 and +1 (FIG. 2) (residues 4611, 
4675, 4694 of SEQ ID N0:1) with the most frequently used site at nucleotide T, designated 
position -83 in FIG. 2. 

All three transcription start sites are close to or within the context of, the initiator 
element 5'-YYA+jNWYY-3' (Javahery et aL, Mol Cell BioL 14:116-127 (1994), herein 
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incorporated by reference) which can mediate the initiation of transcription in genes lacking 
conventional TATAA and CCAAT boxes. Two of these sites, at positions -83 and -19, contam 
one and three mis-noatches. respectively, from the consensus sequence, and the site at position 
-83 is two nucleotides downstream of the more usual A+, start site. The most proximal start 
site at nucleotide -I- 1 is within a perfect consensus, although paradoxically it does not appear 
to be the most frequently used. 

Although fly/i does not have conventional TATAA elements, two AT-rich regions, 5'- 
TGTAAAT-3' at position -49 (nucleotides 4645-4651 of SEQ ID NO. l), and 5'-TCTAAA-3' 
at position -106 (nucleotides 4583^588 of SEQ ID NO. l) are within 20-25 nucleotides of the 
two weaker initiator elements. In common with other genes these regions may cooperatively 
strengthen the initiation of transcription via the TATA- and initiator-binding proteins (Javaheiy 
et al., 1994), and this may explain why the stan site at position -83 appears to be the most 
frequently used site. Interestingly this latter site is closest to the sequence 5'-TCTAAA-3' 
(position -106 of FIG. 2), which is a characteristic TATA element in many human MHC class 
I genes (Le Boutellier, Crit. Rev. Immunol 14: 89-129 (1994), herein incorporated by 
reference). In addition, it should be noted that many of the milk-protein gene promoters have 
rather simihr atypical TATAA boxes, including the sequence 5'-TTTAAAT-3' in the rat and 
mouse whey-acidic protein genes (Campbell and Rosen, NucL Adds Res. 12: 8685-8697 
(1984)) and many of the casein genes (Yu-Lee et al. , NucL Acids Res. 14: 1883-1902 (1986)). 

Btn also lacks typical CCAAT elements in the expected context approxnnately 50 
nucleotides upstream from TATA sequences (Breathnach and Chambon, Ann. Rev. Biochem. 
50: 349-383 (1981), herein mcorporated by reference). However, there are several potential 
CCAAT-like elements (double underlined in FIG. 2), including the sequence 5'-ACAAAGT-3' 
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(nucleotides 4597-4603 of SEQ ID N0:1), which is within 50 nucleotides of the proximal 
TATA box, and the sequences 5'-CCATTT-3' and 5'-CATTT-3' (nucleotides 4546-4551 and 
4533-4537 of SEQ ID N0:1, respectively) which are 30-40 nucleotides upstream of the distal 
TATA box. Of the milk-protein gene promoters sequenced to date, none have conventional 
CCAAT boxes. 

Mapping the 3 -end of mouse butyrophilin mRNA by RT-PCR. The polyadenylation 
signal sequence inflm was identified by using the RT-PCR to amplify four regions of cDNA 
around the first potential polyadenylation (poly A) signal sequence (nucleotides 13091 - 13096 
of SEQ. ID NO. 1) after the stop codon in Btn, Amplified products of the expected size were 
obtained with primers 5' of the putative poly (A) signal sequence, while no RT-PCR products 
were obtained with the primer pairs surrounding this poly (A) signal sequence or enconq)assing 
a region 3' of nucleotide 13,199 (data not shown). These data suggest that the first potential 
polyadenylation signal in Btn is the preferred termination signal and that the 3' end of the 
transcripts lies between nucleotides 13,097 and 13,199. 

The predicted 5'- and 3'- boundaries of Btn lead to estunates of approximately 8.40- 
8.57 kb for the sizes of the initial gene transcripts, and values of 3.50-3.68 kb for the sizes of 
the processed mRNAs. These latter estimates are in good agreement with a value of 3.7 kb 
for die size of mouse butyrophilin mRNA determined by Northern analysis of total RNA from 
lactating mouse mammary gland using mcDNA3 as an oligonucleotide probe (data not shown). 

Sequence analysis of the butyrophilin gene sequence identified single inverted repeats 
in the 5' untranslated region (5' -UTR), and 3 '-untranslated region (3' UTR). Interestingly, 
the repeat sequence in the 5'-UTR (nucleotides 4807-4814 of SEQ ID N0:1) is the exact 
complement of the 3'-UTR sequence (nucleotides 12,556-12,563 of SEQ ID N0:1), suggesting 
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that these sequences play functional roles in the synthesis, stability or regulation of 
butyrophilin transcripts. 

Translation of Mouse Butyrophilin mRNA. The predicted murine butyrophilin amino 
acid sequence was derived after verification of exon/intron boundaries ft-om the DNA 
sequences of mouse cDNAs prepared by RT-PCR and the mouse gene sequence. Total RNA 
was prepared from mouse mammary tissue (day 1 of lactation) (Chomczynski and Sacchi, 
Anal. Biochem. 162: 156-159(1987)) and reverse transcribed into cDNA by incubation with 
MuMLV reverse transcriptase and random hexamers at 42** C for 15 min, following the 
protocol described in the Perkin Ehner RT-PCR kit (Perkin Ehner Corp., Branchburg, NJ). 
The cDNAs. mcDNA 1, 2, 3 and 4 (FIG. IC) were then prepared by amplifying the indicated 
regions of DNA by the PGR. The amino acid sequence was predicted from the verified 
cDNA sequence using the TRANSLATE program from the Wisconsin Genetics Computer 
Group (GCG) (SEQ ID N0:4). Based on this amino acid sequence, the translational initiation 
codon, AUG, is predicted to be at nucleotides 4923-4925 of SEQ ID NO: 1. This site is 
consistent with the predicted location of translation initiation on bovine butyrophilin mRNA 
and is also within the preferred context for most eukaryotic genes (Kozak, Nucl. Acids Res. 
15: 8125-8248 (1987)). There are four other potential AUG initiation codons at positions 
4650,4743.4765,4776 of SEQ ID N0:1 between the most distal transcriptional initiation site 
at position 4611 and the predicted translational start site at position 4923 (SEQ ID NO: 1). 
However, the most distal of these AUG codons is not within the preferred sequence context 
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and the other three are ataiost inunediately followed, in-frame, by the stop codons TAA,. TGA 
and TAG. respectively. In almost all such latter cases the UNA polymerase continues to scan 
the mRNA for the next potential AUG start site (Kozak, Nucl Acids Res. 12:3873-3893 
(1984)). 

Comparison of the DNA sequence of Btn with that of butyrophilin cDNA also revealed 
that, like many other genes in the IgSF (Williams and Barclay, 1988), there is a close 
correlation between exon organization and functional units of the protein. Thus, exon 1 
encodes all of the 5 -UTR and the signal sequence; the location of the signal sequence is 
designated by the vertical dashed line m FIG. IB. Exons 2 and 3 encode the I-set and C-set 
immunoglobulin-like domams, respectively, and exon 4 encodes the membrane anchor. 

Tissue Specific Expression of Mouse Butyrophilin. Previous work has suggested that 
butyrophilin is specifically expressed in mammary tissue and that expression is maximal during 
lactation (reviewed in Mather and Jack, 1993). However, this conclusion was based on the use 
of either relatively insensitive protein and RNA blotting techniques, or immunofluorescence 
microscopy. Thus, the expression of mRNA in mouse tissues was analyzed with a much more 
sensitive RNase protection assay. 

Riboprobes were prepared from a mouse cDNA, mcDNA3 (FIG. IC), subcloned into 
pCR n (Melton et al . NucL Acids Res. 12:7035-7056 (1984)). For anti-sense riboprobe, the 
plasmid was linearized by digestion with JCbal and the RNA synthesized using SP6 RNA 
polymerase. For sense riboprobe, the plasmid was linearized by digestion v^dth HindHI and 
the RNA synthesized using T7 RNA polymerase. In each case the RNA was labelled by the 
inclusion of [a-^P]-dUTP (^ 800 Ci/mmol) in the reaction mixtures. Total RNA was prepared 
(Chomczynski and Sacchi, Anal. Biochem. 162: 156-159 (1987) from 13 tissues (pancreas. 
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intestine, spleen, liver, kidney, heart, lung, uterus, ovary, thymus, brain, salivary gland, and 
mammary) of three Balb/c mice at day 1 of lactation and mammary tissue was pooled from 
three Balb/c mice at each of several developmental stages (pregnancy, lactation, and 
involution). Anti-sense or sense riboprobes (2 x l(f cpm/sample) were incubated overnight 
at 47** C with 10 fig total RNA in 30 fil of a hybridization solution (80% (v/v) formamide, 1 
mM EDTA, 10 mM sodium citrate and 300 mM sodium acetate, pH 6.4 (Ambion, Austin, 
TX)). The RNA in each sample was then digested at 37 C, for 30 min with RNAse One 
(Promega) (5 U/sample) according to the manufacturer's instructions. RNA was recovered 
following standard procedures (Sambrook et al, 1989) and the samples separated by 
electrophoresis in a 6% (w/v) denamring polyacrylamide gel. Radiolabelled riboprobe 
protected from RNase digestion was detected by exposure of the dried gel to X-ray fihn. 

The size of the anti-sense riboprobe was such that hybridization to butyrophilin mRNA 
was expected to protect a 625 bp RNA fragment from digestion with RNase. A radiolabelled 
fragment of the predicted size was only detected in manmiary tissue, out of the total of 13 
tissues analyzed (data not shown). Analysis of mammary tissue at different developmental 
stages showed that butyrophilin mRNA is detectable during pregnancy, lactation and involution 
but not in glands from virgin animals (data not shown). Expression of butyrophilin mRNA 
appears to increase ntiarkedly in the last half of pregnancy and remains at relatively high levels 
throughout lactation. 

AnalysU of the Btn Promoter. Because Btn is specifically expressed in the mammary 
gland and is associated with the MHC or MHC-related genes (Vernet et aL, 1993; Amadou 
et al,. Genomics 26: 9-20 (1995)), a search for similarities between the Btn promoter and the 
regulatory elements of mammary-specific or immune system genes was conducted. 
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Approximately 1.8 kb of Bin 5' flanking sequence, sliown in SEQ ID N0:2, was analyzsed on 
either strand by comparison with sequences in the Transcription Factor Data Base (TFD) 
(Faisst and Meyer, NucL Acids Res. 20: 3-26 (1992)) and by comparison with the published 
sequences of the whey-acidic protein, a-lactalbumin, P-lactoglobulin, and casein genes. 

Over thirty different classes of potential regulatory elements were identified throughout 
the sequence. Elements within Bin previously shown to be functional in the promoters of other 
mammary-specific or inraiune system genes are indicated in FIG. 2. For the sake of clarity, 
other elements are omitted, unless they are specifically discussed ftmher below. 

The mammary-related factors mclude three potential STAT binding sites identified 
using the general STAT consensus 5'-TTNC(N)3AA-5' (Ihle and Kerr, Trends Genet. 11:69-74 
(1995) herein incorporated by reference) (asterisks, FIG. 2). Additional STAT binding sites 
can be identified using a broader consensus, TIQ^^AA, based on the work of Lamb et al. 
(NucL Acids Res. 23: 3283-3289 (1995), herem incorporated by reference) (no asterisks, FIG. 
2). Several C/EBP sites were identified, including one between nucleotides -1505 to -1514, 
which is the imperfect palindrome 5'-ATTAGGTAAT-3' (SEQ ID N0:5). There appear to 
be no sites for the pregnancy-specific mammary nuclear factor (5'-TGAT/ATCA-3', Lee and 
Oka, 1992, herein incorporated by reference) or die single-stranded nucleic acid binding 
proteins (various consensus sequences checked, see Altiok and Groner, 1994, herein 
incorporated by reference). Btn contains potential bindmg sites for NFl, Ets-related proteins 
(PU.l site, Klemsz et ai, 1990, herem incorporated by reference), heptamer bmdmg sites for 
Oct 2A, which will bind Oct 1 (Kemler et aL, EMBO /. 8: 2001-2008 (1989), herein 
incorporated by reference) and glucocorticoid response elements sites). There are several 
YYl sites and at least 11 GMCSF elements (Nimer et at., 1990. herein mcorporated by 
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reference) which will also bind YYl (Ye et al, , Nucl Acids Res. 22: 5672-5678 (1994), herein 
incorporated by reference). Two negative regulatory elements characterized in the whey-acidic 
protein gene promoter were identified (Kolb et al , 1994, herein incorporated by reference). 
These elements (allowing one mis-match each) are within the appropriate context in Btn, 
approximately 270 nucleotides apart in the proximal region of the promoter. Most 
significantly, no "milk-box" region was found using the consensus sequence of Laird et al. 
(1988), herein incorporated by reference, and no obvious CoREs with composite C/EBP, 
glucocorticoid response elements and STAT5 sites (Raught et al., 1995, herein incorporated 
by reference) were identified. Furthermore, comparison of the 5' flanking region of with 
promoters of the casein, whey acidic protein, a-lactalbumin and P-lactoglobulin genes by 
FASTA (Pearson and Lipman, Proc. Nail Acad. Scl aS.A. 85: 2444-2448 (1988) or BLAST 
(Altschul etal, J. Mol Biol 215: 403-410 (1990)) showed only limited similarities. The Btn 
promoter therefore appears to have novel features with respect to the regulatory elements of 
other mammary-specific genes. 

The Btn promoter lacks the characteristic response elements associated with classical 
MHC class I and class 11 genes (Le Boutellier, 1994; Dom et al., Proc. Natl Acad. Scl 
U.S.A. 84: 6249-6253 (1987)). However, there are many potential regulatory elements 
associated with immune system genes including a- and y-interferon response elements and 
consensus sequences for TCF-1 (Faisst and Meyer, 1992, herein incorporated by reference) 
(not shown in Fig. 2), and PU.l (Klemsz et al., 1990). A repeat element of three GMCSF 
sites in the proximal promoter was identified which in the same context has been shown to 
regulate the mitogen-inducible expression of GMCSF in T cells (Nimer et al, 1990). 
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Accordingly, the mouse Bm promoter may be used to direct the expression of desirable 
proteins in the milk of transgenic animals and to screen for compounds that are mitogenic. As 
used herein, the term mouse Bin promoter means all the sequenced nucleotides from 1 to 4693 
of SEQ ID N0:1 or a substantial equivalent. A substantial equivalent is defined as a DNA 
sequence which enables a DNA fragment contaming this sequence to hybridize under stringent 
conditions to a DNA fragment containing nucleotides 1 to 4693 of SEQ ID NO:l. 

In addition to the regulatory elements found in the promoter region of a gene, there is 
evidence that regulatory sequences involved in tissue-specific expression may also be located 
in the transcriptional unit of the gene or in 3' flanking sequences (See, e.g. , Chamay et al. , 
Cell 38:251-263 (1984); GUles et al. Cell 33:717-728 (1983)). Thus, the cloned butyrophilin 
gene may be used as a source of such regulatory sequences. For example, a rDNA construct 
for expressing a heterologous protein may include a DNA sequence coding for the protein 
inserted into the first exon of the Bm gene. Preferably, the insert is precisely fused to the Bm 
signal sequence for targeting the heterologous protein into the secretory pathway normally 
involved in secreting butyrophilm into milk. 

Example 4: Cloning and Analysis nf the Bovine Butvrn philin Prnmnt^ 

The 5' untranscribed region of the bovine butyrophilin gene may be cloned from bovine 
genomic A phage libraries by standard hybridization methods using the bovine butyrophilin 
cDNA disclosed in Jack and Mather (1990). By sequencing a clone contaming the bovine 
promoter, herein referred to as BTNl, and comparing the sequence with the mouse promoter 
sequence, the boundaries of the bovine promoter and regulatory elements contained therein 
may be identified. 
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Example 5: Preparation of Synthetic Butyrophilin Promoter Regions 

It Avill be understood by those skilled in the art that an entire buQ^rophilin promoter may 
not be necessary to provide a desired biological activity. For example, if production of a 
heterologous protein in milk is desired, there will be some minimal region or combination of 
regions within a butyrophilin promoter that is necessary and sufficient to respond to the 
transcription factors that control expression of the butyrophilin gene in lactating mammary 
tissue. On the other hand, if the object is to screen for compounds that are mitogenic, there 
will be some minimal region or combination of regions in a bu^rophilin promoter that are 
necessary and sufficient to direct expression of butyrophilin or other gene in the presence of 
mitogens. Such minimal promoter regions that are necessary and sufficient to provide a desired 
biological activity may be identified by deletion analysis using methods well known in the art. 

In brief, deletion constructs are prepared containing increasiagly smaller portions of 
A,Btnl or BTNl operably linked to a reporter gene {e.g., see Example 6 below) and the 
amounts of reporter gene expression in response to various transcription factors are compared 
among the deletion constructs. The minimal promoter region(s) of ABtnl or BTNl which 
provide the desired response are then either subcloned from the deletion constructs or 
constructed from oligonucleotides synthesized on an automated DNA synthesizer. It will be 
understood that these minimal regions may comprise DNA sequences derived from a 
butyrophilin gene or tiieir substantial equivalents, as defined above. These minimal promoter 
regions may then be operably linked to a desired coding sequence and placed in a recombinant 
expression vector. 
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Example 6 - Construction of Bmvrnphilin ii G H Exnressinn v^rj ^ j 

The Allegro® HGH Transient Gene Expression Imniunoassay System (Nichols Institute 
Diagnostics, San Juan Capistrano. CA) may be used to evaluate butyrophUin promoters or 
promoter regions. In brief, a DNA fragment containing a butyrophilin promoter or minimal 
promoter region is cloned into the p0GH vector which contains the human growth hormone 
(hGH) structural gene but lacks a eukaryotic promoter. The resulting fusion plasmid is 
transfected into a primary mammary cell line, and the hGH secreted into the medium is 
detected immunologically using a monoclonal antibody-based assay (Nichols Institute 
Diagnostics, #40-2205). Since the level of secreted hGH is proportional to mRNA levels, 
promoter activity can be monitored. 

In addition to hGH, other reporter genes such as those encoding chloramphenicol 
acetyltransferase (CAT), green fluorescent proteins or luciferase could be used to evaluate 
butyrophilin promoter regions. Detection of the products of these reporter genes products is 
well-known in the art. 

Example 7 - Construction of Transgftnir A^mmi\]fi 

The production of transgenic mammals containing a fbreign DNA sequence coding for 
a desured protein or polypeptide in its genn line is accomplished by procedures well-known in 
the art. For example, see Rosen, U.S. Patent 5,304.489 (transgenic mice) and Clark et al. , 
U.S. Patent 5,322,775 (transgenic sheep), each of which is herein incorporated by reference. 
Generally, the process comprises coUection of embryos, injection of the DNA into the 
embryos, transfer of the surviving embryos to surrogate mothers, and screening die offspring 
for integration and expression of the foreign gene. To construct the transgenic animals 
embraced by the invention, the injected DNA would be a rDNA construct comprising a 
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butyrophilin promoter or minimal butyrophilin promoter region(s) operatively linked to a.DNA 
sequence encoding a desired polypeptide. The DNA construct preferably also comprises a 
signal sequence operatively linked to the DNA sequence encoding the desired polypeptide. 

In addition to constructing germ-line transgenic mammals, the invention contemplates 
the expression of desired coding sequences under the control of a butyrophilin promoter or 
promoter region(s) in somatic transgenic mammals. As described by Lothar Hennighausen, 
J. Cell. Biochem., 49: 325-332 (1992), herein incorporated by reference, such animals may 
be generated by the physical introduction of DNA with a jet injection gun into the mammary 
epithelial cells of a living lactating anunal. See also Furth, P. A. et aL, Gene transfer by jet 
injection into differentiated tissues of living animals and in organ culture, Mol. BiotechnoL, 
4(2): 121-127 (Oct. 1995), herein incorporated by reference. 

Example 8 - Detection of Disease States Associated With Exp ression of Butyrophilin in 
Nonlactating Mammals 

As discussed above, butyrophilin is a member of the IgSF and its cytoplasmic domain 
is similar to the cytoplasmic domains in zine-finger proteins. Thus, the expression of 
butyrophilin in RNA in nonmammary tissue or in mammary tissue of nonlactating animals may 
be useful for dietecting cancer and other disease states in which the butyrophilin promoter is 
activated. 

The principles, preferred embodiments and modes of operation of the present invention 
have been described in the foregoing specification. The invention wrhich is intended to be 
protected herein, however, is not to be construed as limited to the particular forms disclosed, 
since these are to be regarded as illustrative rather than restrictive. Variations and changes 
may be made by those skilled in the art without departing from the spirit of the invention. 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 

(i) APPLICANT: MATHER Ph.D., IAN H. 

OGG Ph.D. , SHERRY L. 
JACK Ph.D., LUCINDA J.W. 
KOMARAGIRI Ph.D., MADHAV V.S. 

(ii) TITLE OF INVENTION: THE BDTYROPHILIN GENE PROMOTER AND USES 
THEREOF 

(iii) NUIMBER OF SEQUENCES: 5 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: WATSON COLE STEVENS DAVIS, P.L.L.C. 

(B) STREET: 1400 K. STREET, N.W. 

(C) CITY: WASHINGTON 

(D) STATE: D.C. 

(E) COUNTRY: USA 

(F) ZIP: 20005-2477 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC conpatible 

(C) OPERATING SYSTEM; PC-DOS /MS-DOS 

(D) SOFTWARE: Patentin Release #1.0, Version #1.30 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

(C) CLASSIFICATION: 

(viii) ATTORNEY /AGENT INFORMATION: 

(A) NAME: POULOS III, JAMES A. 

(B) REGISTRATION NUMBER: 31714 

(C) REFERENCE/DOCKET NUMBER: 6067/JAP69I70A 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: 202-628-0088 

(B) TELEFAX: 202-628-8034 



(2) INFORMATION FOR SEQ ID N0:1: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14180 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
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(xv) ANTI- SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Mus musculus 

(vii) IMMEDIATE SOURCE: 
5 (A) LIBRARY: 129 ES cell genomic library 

(B) CLONE: Lambda Btnl 

(viii) POSITION IN GENOME: 

(A) CHROMOSOME/ SEGMENT: 13 

(ix) FEATURE: 
10 (A) NAME/ KEY: TATARS ignal 

(B) LOCATION: 4645.. 4651 

(ix) FEATURE: 

(A) NAME/KEY: misc_feature 

(B) LOCATION: 4611 

15 (C) IDENTIFICATION METHOD: experimental 

(D) OTHER INFORMATION: /evidence^ EXPERIMENTAL 
/standard_naine= "transcription start site" 

(ix) FEATURE: 

(A) NAME/KEY: misc_feature 
20 IB) LOCATION: 4675 

(C) IDENTIFICATION METHOD: experimental 

(D) OTHER INFORMATION: /evidence^ EXPERIMENTAL 
/stauidard_name» "transcription start site" 

(ix) FEATURE: 
25 (A) NAME/KEY: misc_feature 

(B) LOCATION: 4694 

(C) IDENTIFICATION METHOD: experimental 

(D) OTHER INFORMATION: /evidence^ EXPERIMENTAL 
/standard_name= "transcription start site" 

30 (ix) FEATURE: 

{X\ NAME/KEY: polyA_signal 

(B) LOCATION: 13091,. 13096 

(C) IDENTIFICATION METHOD: experimental 

(D) OTHER INFORMATION: /evidence* EXPERIMENTAL 

35 (ix) FEATURE: 

(A) NAME/KEY: misc_f eature 

(B) LOCATION: 13097.. 13199 

(C) IDENTIFICATION METHOD: experimental 

(D) OTHER INFORMATION: /evidence* EXPERIMENTAL 
40 /standard^name* "3' end of transcript" 

(ix) FEATURE: 

(A) NAME/KEY: misc_signal 

(B) LOCATION: 4923.. 4925 

(C) IDENTIFICATION METHOD: experimental 

45 (D) OTHER INFORMATION: /evidence^ EXPERIMENTAL 

/standard_name= "Translational initiation codon" 
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(ix) FEATTOE: 

(A) NAME/KEY: raisc__signal 

(B) lOCATION: 4650.. 4651 

(C) IDENTIFICATION METHOD: experimental 

5 (D) OTHER INFORMATION: /evidence^ EXPERIMENTAL 

/standard_naine=! "Trans la tional initiation codon" 
/pseudo 



(ix) FEATUHH: 

(A) KAME/KEY: raisc_signal 
10 (B) LOCATION: 4743.. 4745 

(C) IDENTIFICATION METHOD: experimental 

(D) OTHER INFORMATION: /evidence= EXPERIMENTAL 
/standard_nanie» "Transla tional initiation codon" 
/pseudo 

15 (ix) FEATURE: 

(A) NAME/KEY: misc^signal 

(B) LOCATION: 47657.4767 

(C) IDENTIFICATION METHOD: experimental 

(D) OTHER INFORMATION: /evidence- EXPERIMENTAL 
20 /standard^naines "Trsuislational initiation codon** 

/pseudo 

(ix) FEATURE: 

(A) NAME/ KEY: misc_signal 

(B) LOCATION: 47767.4778 

25 (C) IDENTIFICATION METHOD: experimental 

(D) OTHER INFORMATION: /evidences EXPERIMENTAL 
/standard_naineBi "Translational initiation codon" 
/pseudo 

(ix) FEATURE : 
30 (A) NAME/KEY: intron 

(B) LOCATION: 5002.. 5520 

(C) IDENTIFICATION METHOD: experimental 

(D) OTHER INFORMATION: /evidence- EXPERIMENTAL 
/standard_nainea "Intron A** 

35 (ix) FEATURE: 

(A) NAME/KEY: intron 

(B) LOCATION: 5872.. 8332 

(C) IDENTIFICATION METHOD: experimental 

(D) OTHER INFORMATION: /evidence- EXPERIMENTAL 
40 /standard_name= "Intron B" 

(ix) FEATURE: 

(A) NAME/KEY: intron 

(B) LOCATION: 8615.. 9485 

(C) IDENTIFICATION METHOD: experimental 

45 (D) OTHER INFORMATION: /evidence^ EXPERIMENTAL 

/standard_name= "Intron C" 



(ix) FEATURE: 

(A) NAME/KEY: intron 

(B) LOCATION: 9636.. 10206 
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(C) IDENTIFICATION METHOD: experimental 

(D) OTHER INFOKWATION: /evidenceB EXPERIMENTAL. 
/standard_name» "Intron D" 

(ix) FEATURE: 

(A) NAME/KEY: intron 

(B) LOCATION: 10228,. 10320 

(C) IDENTIFICATION METHOD: experimental 

(D) OTHER INFORMATION: /evidence^ EXPERIMENTAL 
/standard__iiarae= "Intron E" 

(ix) FEATURE: 

(A) NAME/KEY: intron 

(B) LOCATION: 10348.. 10738 

(C) IDENTIFICATION METHOD: experimental 

(D) OTHER INFORMATION: /evidences EXPERIMENTAL 
/standard_name= "Intron E" 

(ix) FEATURE: 

(A) NAME/KEY: misc_f eature 

(B) LOCATION: 48 077. 4814 

(D) OTHER INFORMATION: /standard_name= "Inverted repeat" 

(ix) FEATURE: 

(A) NAME/KEY: misc_f eature 

(B) LOCATION: 12S56. .12563 

(D) OTHER INFORMATION: /standard_naine= "Inverted repeat" 

(ix) FEATURE: 

(A) NAME/KEY: promoter 

(B) LOCATION: 1. .4693 

(ix) FEATURE: 

(A) NAME/KEY: sig_peptide 

(B) LOCATION: 4923.. 5001 



(ix) FEATURE: 

(A) NAME/KEY: misc_signal 

(B) LOCATION: 11395.. 11397 

(D) OTHER INFORMATION: /standard_name= "Translational stop 

site" 
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AA6AATAA6A CACACACACA CACACACACA CACACACACA CACACACAGA GAGAGA6ACA 60 

GAGACAGAGA CAGAGACAGA GAGAGACAGA GAGTAAAATG CATCCAAAAT ATAAATGGTT 120 

ATATAGAGAA ATAAAAGCAT AAATGAGTTA TTGTGTTCAG AAACGTGTAG ACAAGTAACA 180 

5 GATAAAGCAG TTTTACATAG ACTACCACAG TGGCTCATGA CATTAATTTC AGTATTCAAG 240 

AGACTGAGGC AGGAGGATCA CCATGGCTTC AAGACCATTC TAGACTATAT AGTAAGTTTC 300 

AGGCAAGCCT GGGGTACCTT GCCTCAAAAT AAATAAACAA ATAAAATTAA AAAAAAAAAA 360 

AGTAATGAGC AGCCATCTTG GCTTCACCTA CCTATAATCC CAGCCTTGGG AGGTGGAGGC 420 

AGGAGGATAA TTGCAGTGAr ATCTTCAGCT GTATAGTGTT CTTGGTGCCA CCCTGGGCAA 480 

10 TAAGATACAG TTCTCCACCC TCCTACAAAA GTACAGTTAT ACTTGTTT6C TTTTGAAAGA 540 

AGCTATGGAA GTTACCACCC TCAGGTGACT TTTGAGAGA6 GGAGGGGAAT TAACCATGCA 600 

GACAAGACGG GGCAGAATGT TCTGGGAGAG ACGATGAGCA CCATTATCTG GAGGAGGTGC 660 

TTTGAGTAAC CACACCAATT CCGAGTTTGG CCTGCTAGTG GGACAGTGCA GGAAGAGGAA 720 

AGAGAAAGGC TTTTCCTTTT CTTCAATCTG TTACCATGGA AACATCTTTG TCATCTACAA 780 

15 AGAACACATT GGA6GAAGGA AAAGAAAAAA AAAAAAACAA ACACAACATG ATCTGTGAAT 840 

GAGTCTGTGT TGAGTCTCAT TCAQAGTCAC CCTGGAGAGA TGTGTTACAT GCGGCTGTQG 900 

GTCACAGGTT GAACATGCCC AGGACTACCA CAGTGCTTGT CCCTCCCTCC CTTCCAGCTG 960 

TCTTTCTCCT GTTTTTATTT TGAGACACAG TGTCCCTGTG TAGCCTAGCT GGTCTGTGTC 1020 

TTGTTTTGTA GGCCAGGAAC CCTCCCCCCA CCCCCAACAC ACACACACAC AAACTCAGAG 1080 

20 CAATGCTCCT GCATCAGCCT CCTGTCTGTT ACAAATAGTC ATCTTAATTA CATGTCTTCC 1140 

TAGAGCCTAA GGGTTCTGAC GTCAGTGTGG TTCCAAGTCC CAAGTGATGA CAAAAGCCCA 1200 

TCTTAAATTA TTCAGTAATC AGTAATATAT TTAAAGATTT ATTATTAGTA GTGTTAGTAT 1260 

TAGTAGCAGT ATTTGGTTTT TCGAGATAGG GTTTCTCTGT GTATCCCTGG CTGTCCTGGA 1320 

ACTCACTCTG TAGACCAGGC TGGCCTTGAA CTCAGAAATC CGCCTGCCTC TGCCTCCCAA 1380 

25 GTGCTGGGAT TAAAGGGGTG CACCACCACG GCCTTATTTT TCTTATTTTT CATTTGTGTG 1440 

TCTGTGTGTG TGCTCAATTG TGGGTTTGTG CACACAAGGG CAAATGCTCC AGCAGTCCAA 1500 

AGAGTGTCGG ACCCCTGGGA GCTGGCATCC CAGGTGGCCA TGAACCACTT GACATAGQTT 1560 

TCTCTACAAG AGCAGCCATG TTCTTAACTG CTGAAACATC TCTTCAACCC AACTGTTAAT 1620 
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ATTTTTGTTC TTCACTCAAA TAAGCTAGGA 
TTTAAAATAC AAATTGCCCA CAACTGATGA 
GCTATCTTGA TAATATGCAT TTCCTGTAGG 
TAGCArrCAG ATACACAGGG ATTGTTACAT 
5 AGAAAATATT AAGCAAAGAA GTAATAAAAT 

TAGCA6CCAA TCAGGCCAAC GAGAGTAGCT 
AAATTAAATA AAATTGAGAA GAAATGTTGT 
GTTTTTAATT TATTTTTCAT TTATGTGGAT 
CCTGGGAGGC CCCTGGAATT TTAATTACGG 

10 CCCTCTGCAA AAGCAGCCAG CGCTCTTAAC 

AATACAATTA TTAACTTTAA TTTTTAAAAA 
ATAGACAGCG ACTGGTCTGG AGGACTCCAG 
AAGAGAAATG CCCAGCCTTG TTACACCACA 
AATTTTCATA TAAGCATGTA AGATAATACC 

15 TTTAGGATGC TATGATCTTT TTAATATTGT 

TGCCCTTTGA GTCAATGGCT TTATTCATGG 
TCATAGTTAG AAACAAAGAG CTACAAATAA 
CTTCCTCCCT CCTCCTCTTC CTCCTTTTCT 
CCAGGCAGGC TCTCACTGCA TTGCTCTAGT 

20 GGCCTCAAAC TCTTCCTACC TCTGCCTCCC 
TGTCTGTTTT ACTGTGCGTT TAAAAAAAAA 
GGCTAGTAAA ACAAACAGAC ATGGTAGATC 
TTCGAGCCGG ATTCGAACCA GCGACCTAAG 
CC6CTCTACC AACTGAGCTA TCGAAGGATA 

25 AGCTTAGAGQ AGCCACTATG CCTGATTTTA 

TGAATGCTAG ATGGACCTTC TACCAAATGC 
TTAGGTAATT TCCTCATTTA CATTTCCAAT 
CCGAAGTGCA GTCTTTATAC TAGAAAAAGA 
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TGGAACATTT AAATGTATTG TATACGTCAT 1S80 

GGCAAGAGTT CGGAGTAAAG TTCTGAAACT 1740 

TATGAAGGAG ATGAGTGTGG CATTTCTGGA 1800 

TCTCAGTCCT CATGCCAGTC CTCAGCATGC 1860 

CAGATGTGTG CTTGGGAACA GCACACAACC 1920 

AGCTTTGTTC TTAATCATAG TAATTTTTAA 1980 

CAAAAATATA AAGCACTTAC TTTAAAATTT 2040 

GCATGTTAGT CTGTCATGTG TGTGTGGATA 2100 

ACCATGTGGG AGCTGGGAGC TGAGCCCGGG 2160 

TTCTAAGCCA TCTCCTCAGA CTTCAAATAT 2220 

GTCCACACAG AAAGAAGACC AGACCTCAAA 2280 

TCTAGATTTT ACCGAGTGGT CAGCTAATCC 2340 

AAGGTGATAA TGATGATACT AAATTTCACT 2400 

TCTTGTTGCT TGCAGGCCTG AAGGACACTC 2460 

AGGGAAGGTC ACTGATAACA TATATATTTA 2520 

AACTGGATCA AACAGCATAT CGGGTAGTTA 2580 

AAATGCATAT CTTTTTTCTA TTTTCTTCTT 2640 

TTCTTGTTTC TTCCTTTTTT TTTTTTTTTC 2700 

TGTCCAGQAA CTTGATCTGT AGATCAGGCT 2760 

GAATGCTAGQ AATAGAAGCA TGTGTTAGCA 2820 

AAGTTTATCT TGCCCTTACT GTTTGCTACA 2880 

GATCTATCTG AGTTACAAAA ACAGACCTTC 2940 

GATTTCCAGG TCGAATACTC CTACAGTCCT 3000 

CCATGTATAG TGCCTAGCAA AGTCACAAGT 3060 

AGCAGTGCTG GGATCTAACT CAGGGCTTCA 3120 

CAAGTGCATT CTTTTTTTTT TTTTTTTTAA 3180 

GCTATCCCAA AAGTCCTCCA TACCCTCCCC 3240 

ACTAGAAATC TCATAATCTT CGCAAATATA 3300 
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TGCGTATTAG CTATGCTATG AACTATGCAG GAAAACTTAC TATGAACTTA TCACTATGAA 3360 

CTGATATATA TTGrTCTTAA ATTTTATTTT ATATTTATGT ACAGCATAGA AACAATCATT 3420 

GATAAAACTG TTTTTTTTCT TTATCTTTGC ATTTTTTCAG TAATAAATGA AAATTCAAAA 3480 

CCAAATAAGA AATTGCTGAT CTCATGACTG ATGGCAGGGT GAAGCGCCAG GTCCTTGTGC 3540 

AGTTATACCT TGAAGGTGGA CATCCAGTGG ACTCCTGCCA CCCACACCCA CATTCCTGAA 3600 

GGTGTCTCAT GGAAAAGATC AGGGAGGGAG AGCTGCAGCC ATTGTGGACT CACTCTTTAG 3660 

CTATTCACAG ATGTAATGAC AAAGTAATTT ACTTTCTGGG CTCCTATTCT CTTGCCTGTT 3720 

TTGTTTCCAA TACTGTTTGT GTCTAATACT TTTCCAACTT GGCATAATTC AAACAAGGTA 3780 

TTAGTAACAT TAGTCTTTTT CTTAAAAGTA ACAAACACCC CACTCTCTTT TGTTTTGTTC 3840 

TCCATATGTA GCTCTTGCAA GTCTGGATCT TGCTATGAAG CCCAGACTGG CCTTAAACTT 3900 

ACAATGACCC CCGCCTGCGC CCCCCCCTCC CCCCCCATGA ACTTGGGTTA AAAGAACTGA 3960 

AGCCACAGAG TTAAATTCAC AGGCTGATGG CCTCATGACT CATTTCAGTT GCTCAAGTCT 4020 

TCTTTCTTTT TGTCCCCATT CCCTATATTC GGTACAGCTC TTTAATGCAT ATATC6TTCT 4080 

CTTAGGGGAG GAGGATGAAC CCAAACTACC TGACCACTAA TCTGTA6TCC ACAT6TTTAA 4140 

AAGGCTGCTC CTCCCCCCAC CCCGAATAAA TACACTTGGT CACCTGTGGG CAGGCTTCTC 4200 

TAACAGCACA CAGCCTTCTT CCTTCTGAAG AGCTCTCTCT TTGGCCCCGG GGTGACAAGC 4260 

AGCCCTTTTC ACTTGATCAC TGTGGCTCTG GCTCCCTTTT CCTCTGGGTC TGTCGAAATC 4320 

GGTAGGTGCT TCACTCTCAG CTCAGCTCTC TTTGTCTCTT CTCTGTACTA GGCTTTCTGT 4380 

TCCTCAAGCT CTTCAQCTCT GCCTCTCCCC TCTCTCTCAG ACTTTGTCAA GACTGTATGT 4440 

ACCTCACGGT GXAACTCCCA GAGATCACCC TCCTGAGAGC TGCTGGGCTT ACAGTTGAGA 4500 

AACACACCTT GTCTTTCTCT CCTCCTTC6T TTCATTTCAT GTTCTCCATT TCTACCTCCG 4560 

TGGCTTTATC TTCATTATCA CTTCTAAACA CGAATAACAA AGTATCCCAC TCGATTCGAT 4620 

TTTACTTTAT TGTTrTATTG TTATTGTAAA TGAGGAGATT TCTTCATTAT CTACAACTGT 4660 

GCCTCGCGGC TCCATTCTGG AGGCAGTCGA GGGCTGGAGG ACCAGACGTA CAGAGGAAGG 4740 

GTATGGGGCA GGCGCTGTTG TAAAATGGAC TGAAAATGAC CCTGTAGGGG AAATACAGAG 4800 

CCCTCCAGGT TGGAAGAAAC TGGTGGAGAA CAGGGCGCTT GCGGAACCCA TAGTTACCTC 4860 

CT6ACTGTTT CTCTCCCAGC CTGAAGCTCT TGGCGGGCTT CATTGCCCCA GTTAGCTCAG 4920 

AGATGGCAGT TCCCACCAAC TCCTGCCTCC TGGTCTGTCT GCTCACCCTC ACTGTCCTAC 4980 
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AGCTGCCCAC GCTGGATTCG GGTAAGTTTC TGTTCTAGCC TTCTCTTTCT CGCAAAGTTG 5040 

GAAGGTCCCT ATAAATAAAT ACCTCTGACC CGGTTTGGCT CCTGGTGGGG GGACCTTCAC 5100 

CACAATCCAG TGAGTTCAAA GGAAACCCAC TGCGGGAGGT AATACACACC TGCAATTGCA 5160 

GCACCAGGTG GGCCCAGGCA GATTTTTCTG AGGTCAAAAC CTGCCTGGTC TACATAGAGA 5220 

TA6CCAGAGC ATCCAGGGTT ACGTAGGGAG CGCCTAGTTG TTTTTCCTTT AAATCAAAGG 5280 

AATTGGAACG CTAAGTGTGG TGGTGGTGCA CCCCTGTAAT GATCGCACTT GAGAATTGAG 5340 

GGCAAGGAGC TCAAGGCTAC ATAGTGAGCT GGAGGCCACC TTGGGATTTA TGAGATCCAG 5400 

TCTGAAAAAT AAACAGAAGA AAAGAAATAG CAGCCACCCC GAGTTCCTTT CTTTACAAGG 5460 
AGACTGGCCG GTAGGTCCTC CATCCCAACC CATCGTCCTA TCTGACCTTG TTTATTACAG . 5520 

CAGCTCCCTT CGATGTGACC GCACCTCAGG AGCCAGTGTT GGCCCTAGTG GGCTCAGATQ 5580 

CCGAGCTGAC CTGTGGCTTT TCCCCAAACG CGAGCTCAGA ATACATGGAG CTGCTGTGGT 5640 

TTCGACAGAC GAGGTCGACA GCGGTACTTC TATACCGGGA TGGCCAGGAG CAGGAGGGCC 5700 

AGCAGATGAC GGAGTACCGC GGGAGGGCGA CGCTGGCGAC AQCCGGGCTT CTAGACGGCC 5760 

GCGCTACTCT GCTGATCCGA GATGTCAGGG TCTCAGACCA GGQGGAGTAC CGGTGCCTTT 5820 

TCAAAGACAA CGACGACTTC GAGGAGGCCG CCGTATACCT CAAAGT6GCT GGTGGGTACA 5880 

GACGGGATGT GTCGCCTCGT CACTCCGCGC GGAGACTCTC ACTTTGGGGA GAATCATCGT 5940 

GTTCATTCTC CAAIVTCCAAA CGTATTTTCA CGTTTACGTA AGGTTGTGGT GAGCATCTTA 6000 

GATQCTCTGA ACAGCTTCGT GGTTTAATGC CTAAGGATTG ACACCCTAAC AGAGTGTGGT 6060 

CCGTTGCTAA AGTTCTTTAT CCACCTCCAA AATQGTTTTA CTCATATTAC TCATGTTGTC 6120 

TTCTTCTCCC TGTCTGAGAT CATAAGGAAA GAATACATTG AGCTCTAATT TCCCTCCCTG 6180 

TTAGTGATCC AAATCAAGCA AATCTCCCAC TCAGTTTTTC CTACTGTGAA ACCAGAAAGC 6240 

TAAATCCAGC AAGAATTTGC AACAAGGAAC TAGATAAGTG AAAAATGCTT TGTTAATGAT 6300 

AAAACATCAT GTGCTTATAA AGAAATTCCT ACACCTTAGA CTACTGTGTA TAATACACAT 6360 

ATTGCCTTTC TCATTTATTT AGGTATTTTC CTTGCTCCGT TAAGAAAGGA GCTGACATAG 6420 

TGTCTCAAAC TCTACAGCTT TAAGAACACT TTGAAGTCCT TTATCAAGTA CTAGGATCAT 6480 

TCGTAAAACA ATGAGTTTCC CACACCGGGA GTCGAACCCG GGCCGCCTGG GTGAAAACCA 6540 

GGAATCCTAA CCGCTAGACC ATGTGGGAAC TGCTATGCAT ACTTATCTTG CCTCCTCCTC 6600 

CCATGTAAGG ATTCCGGACG ATGACACACC TGCTCTTTAG ATGTTGGGAA AGOAATCTAT 6660 
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CAACWAACT GTATCCCTAG CTCAAAAATA 
CAAATATAAC GCTCATATCA TTTTTAGGGA 
TTATATAATA TATACATATA CATGAAAATA 
TGTTTATATT CACATATACT TGTGGGTTTG 
5 CTGTAATTCC AGCAACTGGG AGATACAAGC 

GCTACTTAGA ACCTCGATGT TGTTATTATC 
GAGTTTAAAT TTTTCATACA CACTCTTTTG 
GGATACTTTT CTAGAGCTCA GAATTTTTCT 
GGAGAAGGAA AAAGAGTTAG GAAAAGAG6A 

10 ACTAAATAAA AATGAAAAGT AGCTTTATGT 
ATTTGTTTGT TTGATTTTTT TTTTTTTTTT 
TATGTAGCTT TGGCTGGCCC CAAACTTGCT 
AAAGCCACCT GCTTCACCCT CTCCAGCACT 
TGTTGTTATA GAGAAATGGT CTTGAACTCG 

15 GATTCTTCTC CCTCCACCTC CTGAATOCCA 

TTTTGTTGGG GCATAATTCA GTGAGGAAAG 
OAGTCTGCAT CCTAGGTAAA GAAAAGTTAA 
CAAAACAAAA TTCCAAACTC TGTTCTCACG 
CTCAGTTGGA AGTGTCTTTG CTTAGCATGC 

20 TCTAAAAATG GTTTGTGGTG GCACGTGTCT 

GAGGATCAGA AGTTCAAGGT CATCTTCGGC 
TCTGTCTAAG AAGGAGAATA CTTTCCCACC 
CATCCTCCAA ATCArGTTAT GCACTTCTAA 
TAAGGCACCT TQCAGCCATT GTTCTTGCTG 

25 CTACTCATCA GACTTTTCCA TATGTTGAAG 

TAAGATATAA AACATTGITT TTATTGATCC 
TCCTACCTAG CCCCAGTCTA CTTTTGCCTC 
TGTTTCCAGG AAAGCATCCA CCAAAAGATT 
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CATTGCCATG TTTTGCCATA TTTAATGTAC 6720 

AAGGCATCCT AAAATTATAT AATATATAAA 6780 

TGTGTATATA CATATATGTA CATAAATATA 6840 

TGTATGATAT TTCAACTGGG AAGTAACACC 6900 

AGGAAGATTA GAAGTTCAAA ATTGACCTTG 6960 

TTTTATAAGT GATGGCCATC TTCATAAAAT 7020 

AATATGAAGA GCTGTTGAGG TGTTGTTTAA 7080 

GTATCCTGTA GAACTGTGAA A6GGGAGAGG 7140 

AAGAGGGAAA TAGAG6GAAG GAAAGATTAO 7200 

TACCTTTGTT GCTGCTAATT TTCTGTTGCT 7260 

TTTTTTTTTT TTTTTGCATC AGGGTGTTAC 7320 

ATGCAGACCA GGCTGGCCTA GAATTCATAG 7380 

CAGATTAAAG GCCTAGACTA TCACTTTCTC 7440 

TTATATAGCA GAGTCTGTTC TAGAACTCCT 7500 

TGATTACAGT TGTGTGTCCC CTGTGTTGGT 7560 

ATGAGGTTGA AAACATTTAA GAAAATTCTT 7620 

ATTATCAACT GCAAACCTCA AGGGGAAAAA 7680 

TATATAGTCT TTTGGGGAGT AGCGGTGTGA 7740 

ACAAAGCCCT GGGTGGGATC TCTA6CACTG 7800 

CTAAGACCAG CATTTGGGAG GTAGAAGCAA 7860 

TGTTTGAGGT CAGCCTGTGC TACATGATAA 7920 

CATCCTAAAA TATTCTAACC ATAGTCATCT 7980 

CCACAGAGGT CTTTCTTGAC TCTAGATCTC 8040 

TTCTTGATAG TTGGACAAAC ACCTCGATGT 8100 

GATGTGATCT CAATAAGGCT ACATCTCTAA 8160 

TTCAGTCTTC ATACAGAAGG ATTAGAGGAA 8220 

CTCTCTGTCT TTCTCAAGCG AAGATTACCA 8280 

AAGGTCAGTT TCTCTCTAAT AGCTGTGGGT 8340 
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TCAGATCCTC AAATCAGTAT GACGGTTCAA GAGAATGGAG AAATGGAGCT GGAGTGCACC 8400 

TCCTCTGGAT GGTACCCAGA GCCTCAGGTG CAGTGGAGAA CAGGCAACAG AGAGATGCTA 8460 

CCATCCACGT CAGAGTCCAA GAAGCATAAT GAGGAAGGCC TGTTCACTGT GGCAGTTTCA 8520 

ATGATGATCA GAGACAGCTC CATAAAGAAC ATGTCCTGCT GCATCCAGAA TATCCTCCTT 8580 

GGCCAGGGGA AGGAAGTAGA GATCTCCTTA CCAGGTCAGT GGAACTAGTG CTGGGTTCTC 8640 

ATGATGACAG AGACTCAGGC CAATATGACT TGGGACCCTG CTCAQAAGGG ACATCATGGC 8700 

AAAATTGTTT ACATCTTCCC CTACAGCTCT TGCCTGCTGA CTTAAGGAAA TCCTACCAAC 8760 

TAAATTAGAA TAAAGATACT TAGGGCTGGG CTGTATCTCT GAGTGCTTGT GTGGCATGCA 8820 

GAAGGTCCTA GGTTTTACCC CTTGGTCTGC ACACACCACC TCCATGCCAG TCTCATAAAA 8880 

ATTCCAGAGC TTTATTCCAG AGAAACAGGT GATAGAAAAG CTTTGCCTCT GGAGTCCTTC 8940 

CTGACAGGAC CCTTCTCCTT CAATAAGCAA GGAGAATAAA TTATTTTTTC TTCTGATTTQ 9000 

ACTGTACCCT CTCTGAACAT TTCCTCCCTT CCTTGTTCCA CAATGGAGCT CCATATAGGC 9060 

CGCCAAAGAC TGCCAAGTTC CTCCAGGAAC TTTCATCATT TCCAATTTAT TACCTGTGAT 9120 

TTAGCAGGAA TCATTCCTTG TTTATTGGCC AATGArTTCC ATCCTATCTT GCATGCAATC 9180 

ACCTTTCCTC TTCCTTCCCT ACCTCAGCTA CCTCACTGAT AGTTAACAAG GGATTGCTGT 9240 

AAATTTTTAT rTCACATGTT CTGACCCCAA CTGGCTGTTC AGTGTTTGCT TTGGCTCAGG 9300 

GTCAAATCTT rCTGGAAAGC TTAGCCTGGA GGGGCAATTC TTGCTGTAGG CAGTGTGAGG 9360 

CCACTGAGAG CACTCCCATG TCTGTTCTCC TTTGGTATCA GGAGAGAAGC TGAAGTTGTT 9420 

CATTTTCCCA ACCAATGTCC TTTTCGGTTT GTTGTTTGTT CATTTTGTTT TGTGTTGTGT 9480 

TTTAGCTCCC rTCGTGCCAA GGCTGACTCC CTGGATAGTA GCTGTGGCTA TCATCTTACT 9540 

GGCCTTAGGA TTTCTCACCA TTGGGTCCAT ATTTTTCACT TGGAAACTAT ACAAGGAAAG 9600 

ATCCAGTCTG CGGAAGAAGG AATTTGGCTC TAAAGGTAAG TCACTGTCCC CAAGGGCTTT 9660 

GTGTCTCGGC TTCCAGGGAA GGTTGAATTC AGGGCTGTTT GGATGACTTC CAACAGGAAG 9720 

ATGCTGGATT TTAAAATTCC GAGGTTGGAA GGAACGATAA ACCTTCAAAA GTCACAGGTA 9780 

CCTACCTACT GTGAAGAAAA GTGCACGTGA CCCAGGCAAA GTCAAAATCA CCTGGAACTG 9840 

TCACTGTGTA CCTGATATTC TTTCACAGCC CAGCTGTAGG CTCTCTGGCC AGTCTAACTC 9900 

TGTTGCCCAG GAAGAATGTT CTTATTAAGA TCTAGCCCTG AGTCCTAAGC CAGGAGGAAC .9960 

TTCCAGGTGA TTTCTTAGAA ATATTCCGGG GAGTCTCTTG TTAATTAATT AATTTATTTA 10020 
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ATATTTACAT 


TTTAGTTTAT TTTGTTTTGC TGGCAGCATT TCTGTTCCTG GTTTGCAGGC 


10080 


AGAGTTCCTG 


TCA.CCAGGGC 


ACCACAGAGT AAACAGTGTC CCCTTGTGTG TCCCTCATTC 


10140 


TGGTTTTCCT 


CCTTCCCCTT 


TCCCATTATA AAAAAAGCCA TTGACATAAT TTTGTTTGTT 


10200 


TTCCAGA6AG 


ACTTCTGGAA 


GAACTCAGTA AGTATTTTTG TTTTGTTTTG TTTTGTTTTT 


10260 


TTGTCACGAG 


ATTTTCTCTC 


TCCTACTTGT TAACTGATGG TCTCTTTCCT TGCGTTTCAG 


10320 


GAT6CAAAAA 


GACTGTACTG 


CATGAAGGTC AGTGGTTCTG AGCTCCTCAC TGCCTCTGAA 


10380 


GCCCTTCCGT 


GGGAGTCAAA 


GACCTGGGAG GCTTGCACTC CAGACTACCT CCTTAGTAAC 


10440 


AGGATAGAAA 


CAGrGGAAGGT 


GACAGCGAAT GGTCTCAGCG CTTTCTGGGA GGCATCGCGA 


10500 


GGACCACTAG 


CTAGCAGAAG 


AGCTCCTTTG AGGGATACCG CATTTGATAG TTCTTAAGTC 


10560 


ATGCCGTAGC 


TGCCAGTAAG 


AQATTGGGGC TAGAGAGAAG GACTGCTAGT GAGTGGCCTQ 


10620 


ATAGCTCCCC 


TACCACAGCT 


CCTGCAACTC TATTCCACGT CTCTGGGAAG GGGAGATAAT 


10680 


TCGGGTAGTC 


TTGATACGGG 


GACAGGCTGA TGCAGTCTCT CTTTGCCTCC AGTTGACGTO 


1O740 


ACTCTGGATC 


CAGACACAGC 


CCACCCCCAC CTCTTCCTGT ATGAAGATTC AAAGTCAGTT 


10800 


CGATTGGAAG ATTCACGTCA 


GATCCTGCCT GATAGACCAG AGAGATTTGA CTCCTGGCCC 


10860 


TGTGTGTTGG GCCGTGAGAC 


CTTTACTTCA GGGAGACATT ACTGGGAGGT GGAGGTGGGA 


10920 



GATAGAACTG ACTGGGCCAT TGGTGTGTGT AGGGAGAATG TGGTGAAGAA AGGGTTTQAC 10980 

CCCATQACTC CTGArAATGG GTTCTGGQCT QTGGAGTTGT ATGGAAATGG GTACTGGGCC 11040 

CTCACCCCAC TCAGGACCTC TCTCCGATTA GCAGGGCCCC CTCGCAGAGT TGGGGTTTTT 11100 

CTGGACTATQ ACGCAGGAGA CATTTCCTTC TACAACATGA GTAACGGATC TCTTATCTAT 11160 

ACTTTCCCTA GCATCTCTTT CTCTGGCCCC CTCCGTCCCT TCTTTTGTCT GTGGTCCTGT 11220 

GGTAAAAAGC CCCTGACCAT CTGTTCAACT GCCAATGGGC CTGAGAAAGT CACAGTCATT 11280 

GCTAATGTCC AGGACGACAT TCCCTTGTCC CCGCTGGGGG AA6GCTGTAC TTCTGGAGAC 11340 

AAAGACACTC TCCATTCTAA ACTGATCCCG TTCTCACCTA GCCAAGCGGC ACCATAACAA 11400 

ATATTCCAGC TTCACGACTT TGCCTTCCTT TGACTAATCC CTCATGCCCC GAAGCTTCAG 11460 

CTGTTGGCTT CTTGCAGCCC TGCTTCTTCC TGGTGGATGG AGATTAATTC ACATTGGGAA 11520 

GGTTAGGTAT GTTGCTGCCA GACAAGGCAG GAAGAAAGGC CATCCTAGTT TGTTTCTGTA 11580 

CTAACAGTGG GGAGGAAGAG AGCTGAATCC TAAACTATTT CCAGTGCTCA TATTCCTTCA 11640 

GGCCAGAGCC TATAGAGAAG GATTTGGTAC AATCACTCGA GGGATCAAGA GGCAATTAGG 11700 
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TTGGCATGGA ATTATGGCAG AAACATCTGG AATAGGGGTA TGTGGAA.TGA CAGGTTTTAG 11760 

GTAAGGGAGA ACAAAACCAA ACCATAGGAT GCTGAGAAAG AAAGATCTTG GACTAAACTC 11820 

CTAAAAAAGC ACTTAGAGAA GATATGACAG GCAAATGAAG TGAATTTGGT CTAATTTGAT 11880 

ACACTTGCCC TGTCCCTAGG GTTTTTCAGT TATATCTCAA TTTTTTTGTT GTTAATTACA 11940 

5 TTTTTGACAG CTTCATACAT GTATATAATG CATTCTAATT ACTCTCACTC TCCTCTATTC 12000 

TGTCTTATTT CCCTCCCCTC CCCTCATACC TTCCTTCTTG CTTCAAACCT GGCACACTGA 12060 

GTTTAATGGG CTATCATGGG AACATGGATT TAGAGCTTTC CTCTGAGCTC AAGAGAGCAG 12120 

GTGTGACTGA ATACAGTGAT TTCCCCTCTC CTACAATCAA TCAGCAGTCA ATAGCTCAGC 12180 

TGG6A6GGGT AGGGCCTCAT GAGACTTCCC CTATCAAGGC TAAATGTTGA AAG6GCCAGT 12240 

10 TTTTAGCACC TGTGAGATCA TGATTGCAAG AGCCCAGAAG ACAGCATTGC TCGGTCATTC 12300 

TCCCTACCCT TTGGCTTTTC TGGTCTTTTG TCCTCTCTTT CAGGATGTGT CTGAACTCTG 12360 

TATCTTAAGT TTTCTATGTC ATGTTCTATA AGATAGAGGA GACTGGCCCT GCTTGTTTGA 12420 

GAGCAATGTG AGCAAGCTAG CAAGAGACAG AAAGGAGCGG AGATGAATAG GGGTAGAGAA 12480 

AATTTTTAAA CAAACCCTCC AGGTGTGTGT GTGTGTGTGT GTGTCTTCCT CTTTTTTGAC 12540 

15 CTCCCTAAAG GTCAATCCAA CCTCACATTA TTGACTCCAC TAGGTGGG6G TTCTGTGTGT 12600 

GTGTGTGTGT GTGTGTGTGT GTGTGTGTGT GTGTTTTAAG ATAGAGGTTT ACTATGTAGC 12660 

TTAGGCTGGC TTTGAATTCC TGATCCTCCT GCCTCTACCT TCCAAGTGCT GGAAACATAG 12720 

CCACATCCAC CACCCCTATC CAGTCCACCT GGTTTGATTC AGCAACGCTC AGGTAGCATC 12780 

GCTGTTTGAT CTGGAGCTGC CAGCTCCCTC GGCCCCCACT GCAATGCTTA ACCCCCTCAC 12840 

20 AGGCACCTTC CCTTGCCTAA CACTGCCATC CTTTTCCACA CT6AGCCATT TGCTCAATGT 12900 

AGCCTACCCA GGTATCCTGC TTTCTGGTCC CCAAAGTTAC ACCATGATGC TCAGCACAGC 12960 

TGGACAGTTT GTCCCAATTT GTGTGTGTCC TCCTGTTTGT ATGGGACTTC TTTTTGTCAA 13020 

TGGCCTGTGT GTGTATCCAA GCTCTTCCAC TTCTATTGTA TTTTTCCGGC TTCTAAAACA 13080 

GATGTTACCA AATAAAGAAA GAGAAAGAAA CGAATGTCTG TTTGCTGAAG GCAGCCTCTG 13140 

25 AACTTTTCTT TCTTTATCCC AATAAGAGGG ACTGGATTAA ACCGAAACAG GAATGAGCGC 13200 

TGCCTGTCTG GGAAAGTCCT ATTGCAGCAG GGCTGTTCTG TATGGTCCGA GGCTTAGGAC 13260 

TGGGAGATTT ACCAGACCAG GCAGAGATGG GAGCTACTCA TGAGGATCAA ACTACCTTCA 13320 

AAGAGGCCAC TGTGCT6ATG GCTTCCTGCT CTCAGCCTTG TTTCAAAGGC AACTTCATTT 13380 
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CTAT'CCCCAC 


TAAGGTAACT TTGTTGGTGA GTAAACTCCA ACACGGTGCC AGATGTACCA 


13440 


AGAGGGTGCA 


GCTCCACAGT AGAGTTCTTG CCCGCCATGC ACCTGCGTGT GTTCCATCCC 


13500 


TAGCACTGCT 


TCTGCCCCAC ACATGATTCC TAACAAGTCT CCAAAGACAT GAAAATTGGG 


13560 


GGATACATTC 


AAACCACTGC AGGTTCTTTC CTCTATCACC TCATGGGTCC CCGGTGCCCA 


13620 


GTGTCTTCCT 


TCTCTTTTTT ATCTCAAACA CTAGCCACCC TATGCAGCTT GTCTTTTACT 


13680 


GTACTCCTAG 


GAGAACGGTA TAATTTACCT TTGATTTAAG AGAATTAACT TAATTGAGTG 


13740 


TGGTGACATG 


GGCCTGTATT CCCCAGAATT CAGAAGACAG AGGCAAGAGA ATTGTCACAT 


13800 


ATTTGAAGCC 


AGCrTGGACT ATATGTCAAT TCAAGGTCAG CCTAAGCTAT ACAGTAACAC 


13860 


CCTATCTCAT 


TAAATAAATA AATAAATAAA TGTGTTCATT TTATTCAAAT ATTTTACTTG 


13920 


TAGAAATCCA 


CAGAAAATAT AGTCGAAACA TCCTTTCAAA AATTGGTGAG ATGGCTCACC 


13980 


AGATAAAGAC 


ACTTACTTGC CAAACCTGAT GACCCGAGTT CAACCCCAGC GACCCACATG 


14040 


GTGGAGTGAA 


TTGTCCTCTG ATATCCACAT GTTTGTCATA GATCATGCTC ACCCATACAC 


14100 


ATATACACAT 


ACACATGCTA AATATGTTCC ATGTCTAAGA AAGGTAGACT GTTGCATCAC 


14160 


TGTGTTTAAT 


GTGTGACAAG 


14180 
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(2) INFORMATION FOR SEQ ID N0:2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1750 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(iv) ANTI- SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Mus musculus 

(vii) IMMEDIATE SOURCE: 

(A) LIBRARY: 129 £S Cell Genomic Library 

(B) CLONE: Lambda BTNl 

(viii) POSITION IN GENOME: 

(A) CHROMOSOME/SEGMENT: 13 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

CGAGCCGGAT TCGAACCAGC GACCTAAGGA TTTCCAGGTC GAATACTCCT ACAGTCCTCC 60 

GCTCTACCAA CTGAGCTATC GAAGGATACC ATGTATAGTG CCTAGCAAAG TCACAAGTAG 120 

CTTAQAGGAG CCACTATGCC TGATTTTAAG CAGTGCTOGG ATCTAACTCA GGGCTTCATG IBO 

AATGCTAGAT GGACCTTCTA CCAAATGCCA AGTGCATTCT tTTTTTTTTT TTTTTTAATT 240 

AGGTAATTTC CTCATTTACA TTTCCAATGC TATCCCAAAA GTCCTCCATA CCCTCCCCCC 300 

GAAGTGCAGT CTTTATACTA GAAAAAGAAC TAGAAATCTC ATAATCTTCG CAAATATATG 360 

CGTATTAGCT ATGCTATGAA CTATGCAGGA AAACTTACTA TGAACTTATC ACTATGAACT 420 

GATATATATT 6TTCTTAAAT TTTATTTTAT ATTTATGTAC AGCATAGAAA CAATCATTGA 480 

TAAAACTGTT TTTTTTCTTT ATCTTTGCAT TTTTTCAGTA ATAAATGAAA ATTCAAAACC 540 

AAATAAGAAA TTGCTGATCT CATGACTGAT GGCAGGGTGA AGCGCCAGGT CCTTGTGCAG 600 

TTATACCTTG AAGGTGGACA TCCAGTGGAC TCCTGCCACC CACACCCACA TTCCTGAAGG 660 

TGTCTCATGG AAAAGATCAG GGAGGGAGAG CTGCAGCCAT TGTGGACTCA CTCTTTAGCT 720 

ATTCACAGAT GTAATGACAA AGTAATTTAC TTTCTGGGCT CCTATTCTCT TGCCTGTTTT 780 

GTTTCCAATA CTGTTTGTGT CTAATACTTT TCCAACTTGG CATAATTCAA ACT^GGTATT 840 
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AGTAACATTA GTCTTTTTCT TAAAAGTAAC AAACACCCCA CTCTCTTTTG TTTTGTTCTC 900 

CATATGTAGC TCTTGCAAGT CTGGATCTTG CTATGAAGCC CAGACTGGCC TTAAACTTAC 960 

AATGACCCCC GCCTGCGCCC CCCCCTCCCC CCCCATGAAC TTGGGTTAAA AGAACTGAAG 1020 

CCACAGAGTT AAATTCACAG GCTGATGGCC TCATGACTCA TTTCAGTTGC TCAAGTCTTC 1080 

TTTCTTTTTG TCCCCATTCC CTATATTCGG TACAGCTCTT TAATGCATAT ATCGTTCTCT 1140 

TAGGGGAGGA GGATGAACCC AAACTACCTG ACCACTAATC TGTAGTCCAC ATGTTTAAAA 1200 

GGCTGCTCCT CCCCCCACCC CGAATAAATA CACTTGGTCA CCTGTGGGCA GGCTTCTCTA 1260 

ACAGCACACA GCCTTCTTCC TTCTGAAGAG CTCTCTCTTT GGCCCCGGGG TGACAAGCAG 1320 

CCCTTTTCAC TTGATCACTG TGGCTCTGGC TCCCTTTTCC TCTGGGTCTG TCGAAATCGG 1380 

TAQGTGCTTC ACTCTCAGCT CAGCTCTCTT TQTCTCTTCT CTGTACTAGG CTTTCTGTTC 1440 

CTCAAGCTCT TCAGCTCTGC CTCTCCCCTC TCTCTCAGAC TTTGTCAAGA CTGTATGTAC 1500 

CTCACGGTGT AACTCCCAGA GATCACCCTC CTGAGAGCTG CTGGGCTTAC AGTTGAGAAA 1560 

CACACCTTGT CTTTCTCTCC TCCTTCGTTT CATTTCATGT TCTCCATTTC TACCTCCGTG 1620 

GCTTTATCTT CATTATCACT TCTAAACACG AATAACAAAG TATCCCACTC GATTCGATTT 1680 

TACTTTATTG rTTTATTGTT ATTGTAAATG AGGAGATTTC TTCATTATCT ACAACTGTGC 1740 

CTCGCGGCTC 1750 
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(2) INFORMATION FOR SEQ ID N0:3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "oligodeoxynucleotide" 

(iii) HYPOTHETICAL: NO 

(iv) ANTI- SENSE: YES 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: mus mus cuius 

(viii) POSITION IN GENOME: 

(A) CHROMOSOME/ SEC^IENT: 13 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

GGGCTCTGTA TTTCCCCTAC 

(2) INFORMAriON FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 524 amino acids 
(3) TYPE: amino acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Mus musculus 

(vii) IMMEDIATE SOURCE: 

(A) LIBRARY: 129 ES Cell Genomic Library 
{B] CLONE: Lambda BTNl 

(viii) POSITION IN GENOME: 

(A) CHROMOSOME/SEGMENT: 13 

(ix) FEATURE: 

(A) NAME /KEY: Domain 

(B) LOCATION: 244.. 270 

(D) OTHER INFORMATION: /note= "Membrane anchor domain" 
(ix) FEATURE: 
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(A) NAME/KEY: Peptide 

(B) LOCATION: 1. .26 

(D) OTHER INFORMATION: /note= »'Signal Peptide" 



(ix) FEATURE: 

(A) NAME/KEY: Domain 

(B) LOCATION: 27.. 143 

(D) OTHER INFORMATION: /note« "I-set immunoglobulin -like 

domain" 



(ix) FEATURE: 

(A) NAME/KEY: Domain 

(B) LOCATION: 144.. 237 

(D) OTHER INFORMATION: /note= "C-set Immunoglobulin- like 

domain" 

(ix) FEATURE: 

(A) NAME/KEY: Region 

(B) LOCATION: 304. .469 

(D) OTHER INFORMATION: /note^ "B30.2 Region" 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:4: 

Met Ala Val Pro Thr Asn Ser Cya Leu Leu Val Cys Leu Leu Thr Leu 
15 10 15 

Thr Val Leu Gin Leu Pro Thr Leu Asp Ser Ala Ala Pro Phe Asp Val 
20 25 30 

Thr Ala Pro Gin Glu Pro Val Leu Ala Leu Val Gly Ser Asp Ala Glu 
35 40 45 

Leu Thr Cys Gly Phe Ser Pro Asn Ala Ser Ser Glu Tyr Met Glu Leu 
50 55 60 

Leu Trp Phe Arg Gin Thr Arg Ser Thr Ala Val Leu Leu Tyr Arg Asp 
65 70 75 80 

Gly Gin Glu Gin Glu Gly Gin Gin Met Thr Glu Tyr Arg Gly Arg Ala 
85 90 95 

Thr Leu Ala Thr Ala Gly Leu Leu Asp Gly Arg Ala Thr Leu Leu lie 
100 105 110 

Arg Asp Val Arg Val Ser Asp Gin Gly Glu Tyr Arg Cys Leu Phe Lys 
115 120 125 

Asp Asn Asp Asp Phe Glu Glu Ala Ala Val Tyr Leu Lys Val Ala Ala 
130 135 140 

Val Gly Ser Asp Pro Gin He Ser Met Thr Val Gin Glu Asn Gly Glu 
145 150 155 160 

Met Glu Leu Glu Cys Thr Ser Ser Gly Trp Tyr Pro Glu Pro Gin Val 
165 170 175 
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Cain Trp Arg Thr Gly Asn Arg Glu Met Leu Pro Ser Thr Ser Glu Ser 
180 185 190 

Lys Lys His Asn Glu Glu Gly Leu Phe Thr Val Ala Val Ser Met Met 
195 200 205 

lie Arg Asp Ser Ser lie Lys Asn Met Ser Cys Cys lie Gin Asn lie 
210 215 220 

Leu Leu Gly Gin Gly Lys Glu Val Glu lie Ser Leu Pro Ala Pro Phe 
225 230 235 240 



10 



Val Pro Arg Leu Thr Pro Trp lie Val Ala Val Ala He He Leu Leu 
245 250 255 



Ala Leu Gly Phe Leu Thr He Gly Ser He Phe Phe Thr Trp Lys Leu 
260 265 270 

Tyr Lys Glu Arg Ser Ser Leu Arg Lys Lys Glu Phe Gly Ser Lys Glu 
275 280 285 



15 



Arg Leu Leu Glu Glu Leu Arg Cys Lys Lys Thr Val Leu His Glu Val 
290 295 300 



Asp Val Thr Leu Asp Pro Asp Thr Ala His Pro His Leu Phe Leu Tyr 
305 310 315 320 



20 



Glu Asp Ser Lys Ser Val Arg Leu Glu Asp Ser Arg Gin He Leu Pro 
325 330 335 



Asp Arg Pro Glu Arg Phe Asp Ser Trp Pro Cys Val Leu Gly Arg Glu 
340 345 350 

Thr Phe Thr Ser Gly Arg His Tyr Trp Glu Val Glu Val Gly Asp Arg 
355 360 365 



25 



Thr Asp Trp Ala He Gly Val Cys Arg Glu Asn Val Val Lys Lys Gly 
370 375 380 



Phe Asp Pro Met Thr Pro Asp Asn Gly Phe Trp Ala Val Glu Leu Tyr 
385 390 395 400 



30 



Gly Asn Gly Tyr Trp Ala Leu Thr Pro Leu Arg Thr Ser Leu Arg Leu 
405 410 415 



Ala Gly Pro Pro Arg Arg Val Gly Val Phe Leu Asp Tyr Asp Ala Gly 

420 425 430 

Asp He Ser Phe Tyr Asn Met Ser Asn Gly Ser Leu He Tyr Thr Phe 
435 440 445 



35 



Pro Ser He Ser Phe Ser Gly Pro Leu Arg Pro Phe Phe Cys Leu Trp 
450 455 460 



Ser Cys Gly Lys Lys Pro Leu Thr He Cys Ser Thr Ala Asn Gly Pro 
465 470 475 480 



-38- 



wo 98/03206 



PCTAJS97/12933 



'Glu Lys Val Thr Val lie Ala Asn Val Gin Asp Asp He Pro Leu Ser 
485 49D 495 

Pro Leu Gly Glu Gly Cys Thr Ser Gly Asp Lys Asp Thr Leu His Ser 
500 505 510 

Lys Leu He Pro Phe Ser Pro Ser Gin Ala Ala Pro 
515 520 

(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "oligodeoxynucleotide" 

(iii) HYPOTHETICAL: YES 

(iv) ANTI -SENSE: NO 



(ix) FEATURE: 

(A) NAME/KEY: protein_bind 

(B) LOCATION: 1..10 

(D) OTHER INFORMATION: /bound_moiety= "Transcription 

factor" 



(xi) SEQUENCE DESCRIPTION: SEQ 10 NO: 5: 
ATTAGGTAAT 
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We claim: 

1. A purified and isolated DNA fragment comprising a DNA sequence having the 
biological activity of a butyrophilin promoter. 

2. The DNA fragment of claim 1, wherein the DNA sequence comprises at least 
one mmimal prontioter region from the mouse Btn promoter, the bovine JJTTV promoter, or then: 
substantial equivalents. 

3. The DNA fragment of claim 2, wherein the minimal promoter region is from 
the mouse Btn promoter. 

4. The DNA fragment of claim 1 , wherein the DNA sequence is selected from the 
group consisting of: 

(a) nucleotides 1 to 4693 of SEQ ID NO: 1 ; and 

(b) DNA sequences which are substantial equivalents of the sequences 
defined in (a). 

5. The DNA fragment of claim 4, wherein the DNA sequence further comprises 
nucleotides 4694-4922 of SEQ ID N0:1, wherein the nucleotides 4694-4922 are contiguous 
with the nucleotides 1 to 4693. 

6. the DNA fragment of claim 5, wherein the DNA sequence further comprises 
nucleotides 4923-5001 of SEQ ID N0:1, wherein the nucleotides 4923-5001 are contiguous 
with the nucleotides 4694-4922. 

7. The DNA fragment of claim 6, wherein the DNA sequence further comprises 
nucleotides 5002-14180 of SEQ ID N0:1, or its complementary sequence, wherein the 
nucleotides 5002-14180 are contiguous with the nucleotides 4923-5001. 
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8. The DNA fragment of claim 1. wherein the DNA sequence is the bovine 572V 
promoter. 

9. A rDNA construct for expressing a polypeptide in the mammary gland of a 
mammal, the rDNA construct comprising: 

(a) a first DNA sequence having the biological activity of the butyrophilin 
promoter; and 

(b) a second DNA sequence encoding the polypeptide operatively linked to the fiist 
DNA sequence* 

10. The rDNA construct of claim 7, wherein the first DNA sequence comprises at 
least one minimal butyrophilin promoter region. 

11. The rDNA construct of claim 10, wherein the minimal promoter xegion is from 
the mouse Btn promoter* the bovine BTTVpromoter* or their substantial equivalents. 

12. The rDNA construct of claim 9, wherein the first DNA sequence is selected 
from the group consisting of: 

(a) a DNA sequence comprising nucleotides 1 to 4693 of SEQ ID NO:l; 
and 

(b) DNA sequences which are substantial equivalents of the sequence 
defined in (a). 

13. The rDNA construct of claim 12, wherein the first DNA sequence further 
comprises nucleotides 4694-4922 of SEQ ID N0:1 , or its complementary sequence, contiguous 
with the nucleotides 1 to 4693. 
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14. The rDNA construct of claim 9, further comprising a third DNA sequence 
encoding a protein signal sequence operatively linked between first and second DNA 
sequences. 

15. The rDNA construct of claim 14, wherein the signal sequence is a milk protein 
signal sequence and the third DNA sequence is fiised to the second DNA sequence. 

16. The rDNA constmct of claim IS, wherein the third DNA sequence is selected 
from the group consisting of: 

(a) a DNA sequence comprising nucleotides 4923-5001 of SEQ ID NO: 1, 
or its complementary sequence, and 

(b) . DNA sequences which are substantial equivalents of the sequences 

defined in (a). 

17. A transgenic mammal containing a rDNA construct in at least its mammary 
epithelial cells, the rDNA construct comprising 

(a) a fu'st DNA sequence having the biological activity of a butyrophilin 
promoter; and 

(b) a second DNA sequence encoding a polypeptide operatively linked to 
the first DNA sequence, the rDNA construct being mtegrated in such a 
way that the second DNA sequence is expressed in the mammary gland 
of the transgenic mammal and the polypeptide is present in the milk of 
the mammal. 

1 8 . The transgenic mammal of claim 1 7, wherein the first DNA sequence comprises 
at least one minimal butyrophilin promoter region. 
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19. The transgenic manunal of claim 18. .wherein the minimal promoter region is 
from the mouse Btn promoter, the bovine fl73V promoter, or their substantial equivalents. 

20. The transgenic mammal of claim 17. wherein the first DNA sequence is selected 
from the group consisting of: 

(a) a DNA sequence comprising nucleotides 1 to 4693 of SEQ ID N0:1; 
and 

(b) DNA sequences which are substantial equivalents of the sequence 
defined in (a). 

21 . The transgenic mammal of claim 20. wherein the first DNA sequence further 
comprises nucleotides 4694^922 of SEQ ID N0:1. or its complementary strand, contiguous 
with the nucleotides 1 to 4693. 

22. The transgenic mammal of claim 17. wherein the rDNA construct further 
comprises a third DNA sequence encoding a signal sequence operatively linked between the 
first and second DNA sequences. 

23. The transgenic mammal of claim 22, wherein the signal sequence is a milk 
protein signal sequence and the third DNA sequence is flised to the second DNA sequence. 

24. the transgenic mammal of claim 17. wherein die rDNA construct is also present 
in die germ cells and all the somatic cells of the transgenic mammal. 

25. A metiiod of producing a polypeptide comprising die steps of 

(a) producing milk in a transgenic mammal, the mammal containing a 
rDNA constiuct in at least its manunary epithelial cells, die rDNA 
construct comprising 
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(i) 



a first DNA sequence having the biological activity of a Btn 



promoter; and 



(ii) 



a second DNA sequence encoding the polypeptide operatively 
linked to the first DNA sequence; the rDNA construct being 
integrated in such a way that the second DNA sequence is 



expressed m the manmary gland of the transgenic mammal and 



the polypeptide is present in die milk; and 



(b) collecting the milk produced in step (a). 

26. The method of claim 25, further comprising: 

(c) removing the polypeptide from the collected milk. 

27. The method of claim 25, wherein the rDNA construct is also present in the germ 
cells and all the somatic cells of the transgenic mammal. 

28. A method for detecting a disease state associated with activation of a Bm 
promoter in nonlactating mammals comprising detecting expression of butyrophilin mRNA or 
protein in a tissue of a nonlactating mammal. 

29. The method of claim 28 wherein the disease state is breast cancer and the tissue 
is breast cancer. 

30. A method for testing the carcinogenicity of a substance comprising comparing 
the level of expression of a reporter gene m a recombinant cell in the presence of the substance 
with the level of expression of the reporter gene in the recombinant cell in the absence of the 
substance, the recombinant cell containing a rDNA construct comprising 

(a) the first DNA sequence having the biological activity of a butyrophilm 
promoter; and 
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(b) a second DNA sequence encoding the. reporter gene operatively linked to the 
&st DNA sequence. 

31 . A purified and isolated DNA fragment comprising a DNA sequence coding for 
a polypeptide having the amino sequence of SEQ ID N0:4. 

32. A purified and isolated DNA fragment comprising a DNA sequence encoding 
mouse butyrophilin, wherein said DNA sequence comprises nucleotides 4694-13199. 

33. A purified and isolated DNA fragment comprising a DNA sequence coding for 
the promoter and transcriptional unit of the mouse butyrophilin gene, said DNA sequence 
obtained by a process comprising the steps of; 

(a) growing ABtol (ATCC Deposit No. 97513) on a host bacteria strain to 
generate a lysate of ABtn phage particles; 

(b) concentrating the ABtol phage particles; 

(c) extractmg ABtnl DNA from the concentrated phage particles; and 

(d) sequencing the extracted XBinl DNA. 
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